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Abstract 

Finite alphabets of at least three letters permit the construction of square-free words of 
infinite length. We show that the entropy density is strictly positive and derive reasonable 
lower and upper bounds. Finally, we present an approximate formula which is asymptotically 
exact with rapid convergence in the number of letters. 

Resume 

II est possible de construire des mots de longueur infinie sans carre sur un alphabet ayant 
au moins trois lettres. Nous demontrons que I'entropie du langage des mots sans carre 
sur un tel alphabet est strictement positive et I'encadrons par des bornes inferieure et 
superieure raisonnables. Enfin, nous donnons pour I'entropie une expression approchee 
qui est asymptotiquement correcte et converge rapidement lorsque le nombre de lettres de 
I'alphabet tend vers I'infini. 
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1 Introduction 

As is well-known, deterministic rules (like substitution rules etc.) can only result in tilings 
or discrete structures with vanishing entropy density ][]] - the famous Penrose tiling is an 
example of this phenomenon. This tiling and many other ones appear in the description of 
so-called quasicrystals for very good reason: Locally finite discrete structures (such as 
tilings with only finitely many prototiles) provide useful cell models for the description of 
the ordered state, not only crystalline but also quasi-crystalline 0. In the latter case, one is 

t Address after April 1996: Institut fur Physik, Technische Universitat, D-09130 Chemnitz, Germany 
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particularly interested in models with finite (i.e., non- vanishing) entropy density, as this is 
one possible mechanism to explain non-periodic order through entropic stabilization. In order 
to combine long-range order with a decent amount of randomness, so-called random tiling 
models have been studied in quite some detail Q , and there exists a reasonable qualitative 
understanding. 

In view of these remarks it is clear that exactly solvable random tiling models are of 
interest. The standard case in one dimension is trivial as it is essentially equivalent to a 
Bernoulli scheme (cf. Ref. This can hardly be called ordered in any sense. In two 

dimensions, we know of only a few solved cases (such as the Fisher-Kasteleyn domino model 
(see app. E in Ref. ||), the hexagonal random tiling model |4| or the square-triangle random 
tiling model 0), and many attempts to improve this situation have failed so far, and the 
situation in higher dimensions is even worse. So the question arises whether one can find 
other examples in ID that are more restrictive than the Bernoulli scheme but still provide 
reasonable toy models of (partially) ordered states. Here one can scan the vast number of 
examples of automata and other sequences ||, but hardly any of them yield an interesting 
model with positive entropy density. 

One interesting class, however, is provided by infinite words in a finite alphabet that 
avoid the repetition of certain patterns. The simplest such case is the ensemble of square- 



free words, first studied by Thue M, 10] and later on reinvestigated many times, see e.g. 



Refs. [jTT]| — 1|22|| . In particular, the combinatorial problems in the treatment of these systems 
are very interesting. It turned out in a series of independent articles |I7|, [18], [L9j that the 
entropy density of square- free words in three letters is positive. It was conjectured that the 
existing upper bounds were much closer to the actual value of the entropy density than the 
lower bounds, and that it is close to 0.3. We will see later that the numerical value is in fact 
about 0.263719. 

In this article, we will summarize some of the properties of the ensemble of square-free 
words in an alphabet A with finitely many letters, x say. (For a general background, we 
refer to Ref. [lj], although we shall use slightly different notation here as a compromise 
between mathematical and physical literature.) We will concentrate on the case x = 3 for a 
while before we treat the general case. We present various rigorous results but also include 
new numerical calculations which finally guide us to the conjecture of an asymptotically 
correct formula for the entropy density of square-free words in x letters which turns out to 
be amazingly accurate already for small x. 



2 Basic setup and inequalities 

Let A = {a x , a 2 , . . . , a x } be a finite alphabet with x different letters (so x e IN is an integer). 
Then, by A = A^°, we denote the set of all words of finite length in the letters of A. This is 
a monoid with concatenation of words as operation and the empty word as neutral element 



I4]]. Later on, we will restrict ourselves to subsets of A which are more interesting. If we 
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write £(w) for the length of a word (thus £(w) G INo for all w E A), we can define (finite) 
subsets of A by 

A n := {w e A | £(w) = n}. (2.1) 
Here, Aq consists only of the empty word and 

oo 

A *^-oo l^_J An (2-2) 

n=0 

which is sometimes also called the dictionary of the trivial language (we have put no rules 
yet, so the above language simply consists of all finite words in the alphabet A). We will 
generalize this in a moment, but consider only situations where the number of words of length 
n behaves in such a way that its logarithm divided by n defines a sequence that converges. 
This is obviously so in the above case, where we have 

\A n \ = x n . (2.3) 

Now, we can define the entropy s = s(x) (it is actually an entropy density) through 

s(x) = lim bg(|A|) . (2.4) 
In our present case, we have of course 

s(x) = log(a;) (2.5) 

which is a measure for the growth rate of \A n \ in n: \A n \ = exp(n • s(x)). 

Now, let us introduce the concept of repeat-free or square-free words. A word w is called 
square-free if neither w nor any substring of it is a square, otherwise w is said to contain a 
square. So, in an alphabet with two letters, A = {a, b} say, a, b, ab, ba, aba and bob are 
square-free while all other words in these two letters are not. Consequently, as there are only 
finitely many square-free words in two letters, the corresponding entropy is zero. To make 
this more precise let us define 

A n = A+uA~ (2.6) 

where A~ (A^) denotes the subset of all square-free words (square- containing words) of 
length n, and the right hand side of ( |2.6| ) is clearly the union of two disjoint sets. Conse- 
quently, 

\At\ + \A-\ = \An\ = x n (2.7) 
and we introduce the abbreviation 

:= |^| (2.8) 

for convenience. There is always an implicit dependence on x, the (finite) number of letters 
of the alphabet A, but we will often suppress it when it is not needed. 

One can now derive several properties of these numbers. It is clear from Eq. ( |2.6| ) that 
we have 

co+(x)+u;-(x) = x n (2.9) 
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and also the initial conditions 

uJq{x) = 1 , oj^{x) = x , uJq(x) = uj±(x) = . (2-10) 

Observe also that u^ix) = x, as there are precisely x possibilities for words of type aa etc. 
If now, for x fixed, oj~(x) = for some n, we must have u£ +rn (x) = x n+m for all m > 0. 
This is so because no continuation of a square- containing word can become square-free and 
we can then use Eq. ( [2.9| ). This situation occurs with x = 2, but not with any larger x. We 
can strengthen this type of argument to obtain, for n > 0, 

u+ +l (x) > x-u+(x) +uj~(x) . (2.11) 

This is so because every word of A* becomes, by adding one arbitrary letter of the alphabet 
(x possibilities), a word of A^+i, while every word of A~ can be made to one of An + i at 
least by repeating the last letter of it. 

Now, with the initial conditions (|2.10| ), repeated application of inequality ( [2.1 1| ) shows 



(n > 1) 

w+(x) > x n ~ l . (2.12) 

It is possible to define the entropy s + of the dictionary of square-containing words, .4.+ , 
which exists as an ordinary limit. We obtain 

Proposition 1 The entropy of A^ equals that of A^: s + (x) = s(x) = log(x). 
The proof is a direct application of inequality ( |2.12j ): 

s ix) = hm > iog(x) ■ hm = log(x) . 

n— >oo fi n— >oo yi 

On the other hand, we have s + (x) < s(x) = log(x), from which the statement follows. □ 

For the other entropy, s~(x), we have to prove existence as an ordinary limit first. We 
do that in a slightly more general form, following an argument given by Pleasants |23| . 



Lemma 1 Let a(n) be a sequence of positive integers with a(m + n) < a(m) ■ a(n). Then 
the sequence defined through h(n) := log(a(n))/n is convergent. 

PROOF: Let us take two integers N > n related by iV = qn + r with < r < n. We then 
have, with x := a(l), 

a(N) < (a(n)) q -a(r) < (a(n)) q ■ x r (2.13) 

and, consequently, 

log(q(iV)) g-log(a(7i)) r ■ log(a;) log(a(n)) | tog(g) 

N S N N n q ' [ ] 

This means h(N) < h(n) + ^ log (x) for any = qn+r with < r < n. But then, we can fix n 
and take the limit N — > oo which also implies q — > oo. This gives limsup^^^ h(N) < h(n). 
The last equation is valid for all n e IN, so we also have 

limsup/i(A^) < liminf/i(n) (2.15) 

N^oo n^oo 



which means that both must be equal and the limit exists. □ 

If a(n) is only a sequence of non-negative integers, but still with a{m + n) < a(m)a(n), 
one either has a(n) > for all n or, if a(n Q ) = for some n , one has a(n) = also for all 
n > n due to submultiplicativity. In the latter case, we follow the usual convention and 
define h(n) = which results in lim^oo h(n) = 0, i.e. vanishing entropy. 

It is clear that the above Lemma is not the most general formulation of the statement, 
but it is sufficient for our needs. The language of square- free words is subword- closed, i.e., 
no new substring of length n can occur in any word of length > n [23]. Consequently, our 



numbers lu~(x) are such that the lemma applies. We have thus shown 

Proposition 2 The entropy s~(x) of A Xl {x) exists as a limit. 

We so far only know the trivial inequality 

< s'(x) < log(» (2.16) 

with s~(x) = for x = 1 and x = 2. Since we have at most x — 1 possibilities to make a 
square-free word of length n into one of length n + 1 by adding a letter, we also have (n > 1) 

u- +1 (x) < (x-1)-uj-(x) (2.17) 

from which we can improve the upper bound of ( |2.16j ) to 

s~(x) < log(x - 1) . (2.18) 

Of course, we can further improve the upper bound by considering the possibilities of ap- 
pending (square-free) words which consist of more than one letter. In practice, this amounts 
to actually counting the number of square-free words of a certain length. One obtains 

<«W*(s) < ! %^Y"-(s) (2-19) 
Uj ( x ) 

where j G {0, 1, 2} and n > j and k > are arbitrary. This expression follows by extending 
square-free words of length n with an overlapping square-free word of length k + j. By 
restricting the length of the overlap to j < 2 all possibilities appear with the same frequency 
(by symmetry). For the entropy, this yields upper bounds 

s~{x) < \ {\og{uj k+J {x)) - \og{uj{x))) (2.20) 

which clearly gives the strongest bound for j = 2, with cu^(x) = x{x — 1). 

Now, a more interesting point is the question for a lower bound of the entropy s~{x). 
We know it is zero for x = 1 and x = 2. As we will show, it is strictly positive for other 
x, i.e. for x > 2. The case x = 3 will play a special role, but let us first give some simple 
results. Here, we rely on the well-known fact that there exists at least one square-free word 



of infinite length in three letters, compare Ref. |TJ| and references therein. But then, we can 
directly show 
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Theorem 1 For x > 3, the entropy of has a lower bound: s (x) > |log(a; — 2). 
In particular, the entropy is strictly positive for x > 3. 

PROOF: Let w be a square-free word of infinite length in 3 letters, {a, b, c} say, which we 
know to exist from Refs. |9|, [14]]. Let w n be the subword made from the first n letters of w 



which is also square-free and contains m a , m^, m c letters of type a, b, c, respectively, where 
"ta + mb + = n. Now, let {di, c?2, . . • , d p } be p new letters, p > 0. If we fix all b's and c's 
in w n , we have (p+ l) ma possibilities to make square-free words of length n in four letters by 
successive replacement of any a in w by an element of {a, d\, . . . , d p }. We can analogously 
proceed for the other two letters, b and c, through fixing a, c and a, b, respectively. In this 
setup, we have x = p + 3, and we can conclude (for x > 3) 

uj-(x) > (x - 2) ma + (x - 2) mb + (x - 2) mc > 3-(a;-2) n/3 (2.21) 

where the second inequality is a standard result from calculus. But from this, we immediately 
get the inequality 

1 3 1 

8~(x) > -\og(x-2)+ lim - = -log(x-2) (2.22) 

3 n ->°° n 3 

from which the statement follows. □ 

We cannot gain anything about e := s~(3) this way, although, as we will see, precisely 
this e is important. Nevertheless, we can do better than fl2.22|) . In the above argument, 



we started from an infinite word w in three letters. Instead, we can also apply the same 
type of argument for the step from x to x + 1 letters: fixing x — 1 letters, we stay with two 
possibilities for the replacement of every occurrence of the remaining letter, and this can be 
done in x different ways. As it applies essentially to every word separately, the number of 
possibilities behaves almost multiplicatively, i.e. it grows like 



-(x + 1) ~ x-2 n/x -u-(x). (2.23) 



We cannot write > instead of ~ here, as one does in fact multiply count several words. 

To avoid this, we have to discard all words of A~ (x) that do not contain all letters. Let 
us denote the number of square- free words of length n in exactly x letters by ip n [x). Clearly, 
^n(O) = 5 n ,o, ^n(l) = <W and ip n {2) = 2(5 rh2 + S n>3 ). Also, ip n (3) = > for all n > 3. 

Furthermore, we have 



w »0»0 = E J ( 2 - 24 ) 

k=o v J 

and also, obviously, oj~(x) > ip n (x) for all n,x. For x > n, one has ip n (x) = 0, and 
i>x{x) — x\, the number of permutations of x symbols. By simple counting, one finds 
ipx+i{x) = x\x(x — l)/2. A little less obvious is the inversion of Eq. ( j2.24|) in the form 

1> n (x) = E(-l)**(fc) (2.25) 
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The number ip n (x) is a multiple of x\ since any permutation of the x letters transforms a 
square-free word in exactly x letters into another one ot this type. The advantage of the 
new numbers is that we can go from ip n (x) to if) n (x + 1) without any double counting, i.e., 
two different square-free words of length n with all x letters in them give automatically two 
disjoint sets of square-free words of length n in x + 1 letters by the above procedure. If we 
observe that we must introduce at least one new letter, we see that, for x > 1 and n > x+ 1, 
we get the inequality 



+ > ! ■ 2"/* • </vOr) . (2.26) 
This is helpful because we have: 

Lemma 2 Let x > 3 be fixed. Then, the sequences ip n {x) and uj~(x) have the same expo- 
nential growth in n: 

Hm MVVM) = Um log(^n(x)) ^ 

n — >oo fl n— »oo ^ 

PROOF: Iterating Eq. ( [2.261 ) one obtains (for < k < x and n > x) the inequality 

Mk)<M*)-(-j£^-^\- (2.27) 
Consequently, we obtain, from Eq. ( |2.24|) , the inequality 



uJ n (x) < i) n (x) 



l + v - ■ — - 



(2.28) 



Since every single term under the sum is certainly not bigger than 2x, we finally get 

1>Jx) < w-(x) < (l + 2x(x-l))-^ n (x) < (l+2x(x-l))-u-(x) (2.29) 

from which the statement easily follows. □ 

From this Lemma and from Eq. (|2.26|) we see that the entropic contributions are additive: 

s~(x + l) > s-(x) + l ^Q (2.30) 

x 



which is valid for x > 3. In fact, repeating the argument of ( 2.30Q , we immediately arrive at 



Theorem 2 Let x > 3. The entropy of the dictionary A^x) fulfills: 

£ + log(2) • (- + - + ... + -—_) < S "(a;) < log(x-l). 

Here, and in what follows, we will always use e for s~(3). Of course, one can now use the 
formula 

a m := + + log(m) ? ™ 7 ~ 0.5771 . . . (2.31) 

2 3 m 



for Euler's constant and the fact that the sequence (cm) TO6 |N * s strictly decreasing to simplify 
the inequality of Thm. (0) for larger values of x, while for small values it is better to stick 
to the finite sum. 

Again, we have seen that e plays a special role. In fact, although this does not follow 
from any simple inequality of the above type, e is strictly positive [17], |18[ Brandenburg 



|y| shows that 

w£j3) > 2»w-(3) (2.32) 

and concludes from this that 

oj~(3) > 6-2 n/22 . (2.33) 
In fact, using the existence of the limit (which we have shown above), we can improve 



the argument slightly (cf. Ref. |T7], jl9"f) to obtain a strict lower bound for the entropy of 



square-free words on three letters: 

Theorem 3 The entropy e = s~(3) is strictly positive: 

e > 2j;log(2) ~ 0.033007... (2.34) 

Since this kind of result has been described several times already fl7| , [T8| , IE], we shall not 
repeat the proof. The idea behind it is the following: one tries to find a set of substitution 
rules which map square-free words into square-free words of increased length and simulta- 
neously allow some free choice to do so in each step - which accounts for the exponential 
growth. It is not clear whether one can significantly improve the lower bound in this way so 
as to approach our numerical estimate of 

e ~ 0.263719(1) . (2.35) 



3 Some results for three letters 

Let us describe the case x = 3 in more detail. We can write the set of square-free words in 
three letters SIS db disjoint union of three subsets 

A- = A^UA^UA^. (3.1) 

Here, A^ denotes the set of what we call stop-words. These are characterized by the 
property that by appending any letter of the alphabet A one obtains a square- containing 
word. In the same spirit, A£> and A^ are defined as the sets of square-free words which 
allow, respectively, one and two extensions to a square- free word of length n + 1. Introducing 
the notation 

fee {0,1, 2}, (3.2) 



(k) 



this implies the relation 

$>& = = ^ 1) + 2-^). (3.3) 



2 2 



k=0 k=0 
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Hence the growth rate is given by 



(2) _ „,(0) 



1 + _^ — _JL_ . (3.4) 

Since the left-hand side converges to a finite value in the limit n — > oo, so does the right-hand 
side, and by the positivity of the entropy we see that convergence of the sequence u>^/u>~ 
would imply the convergence of the ratio uj^/u~ to a finite non-zero limit. 

In order to gain some insight into the actual behaviour of these quantities we used the 
computer to investigate square-free words in three letters for lengths up to 90. The results 
are presented in Tables 1 and 2. 

In Table 1, we list the number of square- free words uj~, approximants to the entropy 
obtained from the ratio of successive values, and upper limits to the entropy using Eq. fl2.20| ). 
From these, we extract 

e ~ 0.263719(1) (3.5) 

as the approximate value of the entropy of square-free words on three letters, where the 
figure in parentheses denotes the estimated uncertainty in the last digit. 

It is striking that the logarithm of the ratio obviously approaches the limit value much 
faster than the value in the last column. This in fact suggests that the asymptotic behaviour 
looks as follows 

log(a;~) ~ e ■ n + a + o(n _1 ) (n — > oo) (3.6) 

where the constant term can be estimated as a ~ 2.5438965. It is interesting that the 
next order seems to fall so quickly, which also indicates that no logarithmic corrections are 
present. 

Table 2 contains the values of uj® (k = 0, 1, 2) and their ratios with u~. Apparently, the 
ratios converge as n — > oo, which means that all three subsets have in fact the same entropy 
as the set of square-free words itself. For the ratios, we estimate 

wW ujW 
~> 0.036837, -2- ~» 0.624564, ~> 0.338599, (3.7) 

U n 

in the limit n — > oo, where the uncertainty is about one figure in the last digit. 

We note that the stop-words still show an interesting structure if one looks at the three 
lengths of squares that one obtains on appending the three different letters. Apparently, stop- 
words with certain fixed sets of periods still occur with the same entropy density, whereas 
other periods are limited to "symmetric" stop- words that cannot be extended in any direction 
and therefore show up in finitely many stop-words only. As an example for this, we mention 
the shortest stop-words (abacaba and those obtained from permutation of letters, where a, b 
and c denote the three letters) which result in squares of lengths 1, 2 and 4. Here, it is easy 
to see that these are the only words with this property. 
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Tabic 1: Number of square- free words in three letters; estimates and upper limits for their entropy. 



n 




Iog(w„/w„_i) 


l°g( W n /6) 

n — 2 


1 


3 








2 


6 


n 
u 


ficm 471 s 
oyoi^t / io 




3 


12 


n 

u 


Rem zt71 S 
oyoi^fc / io 


0.69314718 


4 


18 


n 
u 


^UO^OOl 1 


0.54930615 


5 


30 


n 


OlUoZOOZ 


0.53647929 


6 


42 


n 

u 


OOU^ ( ZZ^ 


0.48647752 


7 


60 


n 


oooo / ^y^ 


0.46051702 


8 


78 


n 
u 




0.42749155 


9 


108 


n 
u 




0.41291025 


10 


144 


n 
u 


Zo / OoZU / 


0.39725673 


11 


204 


A 

u 


o4ooU00y 


0.39181784 


12 


264 


A 

u 


9^78901 1 

zo / ozy i i 


0.37841898 


13 


342 


A 

u 


zooooioo 


0.36755010 


14 


456 


A 

u 


9S7KS9D7 
Zo / OoZU / 


0.36089444 


15 


618 


A 
U 


ouoyyooo 


0.35651761 


16 


798 


A 
U 


ZOOOZUl^ 


0.34931064 


17 


1044 


A 
U 


ZOO / UOl / 


0.34393701 


18 


1392 


A 
U 


9S7RS9D7 
zo I OOZU / 


0.34042108 


19 


1830 


A 
U 


Z / OO / ^^u 


0.33648893 


20 


2388 


A 
U 


ZOO l^iUZO 


0.33258069 


21 


3180 


A 
U 


ZoO^ZOUU 


0.33015144 


22 


4146 


A 
U 


9R^9K9S9 
ZOOZOZoZ 


0.32690698 


23 


5418 


A 
U 


ZO / OoZ / 


0.32408205 


24 


7032 


A 
U 


9RD7AA49 

zou / ^^^z 


0.32120302 


25 


9198 


A 
U 


zoooi^y i 


0.31891227 


26 


11892 


A 
U 


9^fi87Q8zl 
ZOOo / yo4 


0.31632757 


27 


15486 


A 
U 


9(5/1070/18 
ZO^U / U^o 


0.31423730 


28 


20220 


A 
U 


9fif5.7Q.K89 
ZOO / OOoZ 


0.31241032 


29 


26424 


A 
U 


9«7«nnzi7 
ZO / ouu^ / 


0.31075069 


30 


34422 


A 
U 


9(5/1/199.91 
ZO^^ZOZl 


0.30909613 


31 


44862 


A 
U 


9fi/18Q^99 

zo^oyozz 


0.30757198 


32 


58446 


A 
U 


9fi/l c i1 91 4 


0.30613664 


33 


76122 


A 
U 


9«/i9Q/in7 
zo^zo^u / 


0.30478492 


34 


99276 


A 
U 


zooooooo 


0.30355936 


35 


129516 


A 
U 


zooyuuoo 


0.30241820 


36 


168546 


A 
U 


ZOO^U^Zo 


0.30127072 


37 


219516 





26421641 


0.30021203 


38 


285750 





26369218 


0.29919758 


39 


372204 





26432479 


0.29825506 


40 


484446 





26356388 


0.29734215 


41 


630666 





26377043 


0.29648134 


42 


821154 





26393426 


0.29566768 


43 


1069512 





26424708 


0.29490131 


44 


1392270 





26373304 


0.29415920 


45 


1812876 





26397903 


0.29345734 
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Table 1: (continued) 



n 




lo g( w n/ w n-l) 


l°g( w „ /6) 
n-2 


46 


2359710 


A 
U 


ZDoDZ4ZU 


u 


000770Q1 
ZyZ ( (Vol 


47 


3072486 


A 
U 


OfiQO/1 808 
ZDoy4oZo 


u 


0001 ^8£10 

zyzioooz 


48 


4000002 


A 
U 


0£Q8078fi 

ZDooU / oD 


u 


001 KOOTA 

zy lozz ( 4 


49 


5207706 


A 
U 


ZDoo44oy 


U 


0OOOQQ8 A 

zyuyooo4 


50 


6778926 


A 
U 


0fiQfi700Q 
ZDoD / yZo 


U 


oonQ^^fi/i 
zyuooou4 


51 


8824956 


A 
U 


0£Q7fi/l OQ 
ZDo i D4yo 


u 


08080^1 £\ 
ZoyoZolO 


52 


11488392 


A 
U 


ZDo / OOOA 


u 


08OQO1 Tfc 

zoyoui 1 


53 


14956584 


A 
U 


OfiQ81 A AT 
ZD00144 / 


u 


0888O0O1 
ZoooUZUl 


54 


19470384 


A 
U 


0fiQ7/1 OO/I 
ZDo 1 4Zy4 


u 


088Q0O1 1 
ZoooZUll 


55 


25346550 


A 
U 


0fiQ7/1 808 
ZDo / 4oUo 


u 


0878^/10 

zo * ooo4y 


56 


32996442 


A 
U 


0fiQ7^71 1 
ZDo (Dili 


u 


0Q7/1 1 non 

Zo i 41UZU 


57 


42957300 


A 
U 


ZDooUDoD 


n 
u 


08^08i n^. 


58 


55921896 


A 
U 


9£Q7/1 O/l O 
ZDo / 4y4U 


u 


ZoDODOZU 


59 


72798942 


A 
U 


0fiQ7/l KAO 
ZDo / 404Z 


n 
u 


08£i1 «t;Qq 


60 


94766136 


A 
U 


9£Q71 071 
ZDo 1 1U / 1 


u 


08^778^8 
Zoo I ( OOO 


61 


123368406 


A 
U 


0£Q7fiOOO 

ZDo i Dzyz 


u 


08^/in^Q 
Z004U00O 


62 


160596120 


A 
U 


0fiQ71 7^0 

ZDo / 1 i oy 


u 


OQf;n/i /in^ 
ZO0U44U0 


63 


209059806 


A 
U 


0fiQ70770 
ZDo i Al i A 


u 


08/1 AO/1^1 
Zo4Dy4Dl 


64 


272143380 


A 
U 


0(^70,870 
ZDo / Uo / U 


n 
u 


08/1 ^fil 9 
Zo4o001o 


65 


354271314 


A 
U 


0fiQ7QQ08 

ZDo / ooyo 


u 


08/ino88n 

Zo4UZooU 


66 


461181036 


A 
U 


0fiQ707fiQ 
ZDo / Z / Do 


u 


08Q71 1 C^O 

zoo 1 iioy 


67 


600356406 


A 
U 


0fiQ7Q080 
ZDo 1 ozoZ 


u 


OQQ/lfl/1 00 
Zoo4U4ZZ 


68 


781520994 


A 
U 


0fiQ71 8^0 
ZDo / I00Z 


u 


oqqi n^o^ 
zooiuoyo 


69 


1017362166 


A 
U 


0fiQ70fi/l 3 
ZDo / ZD4o 


n 
u 


08081 A71 
Z0Z0ID ( 1 


70 


1324371090 


A 
U 


ZDo / Z40O 


u 


080^^ KQA 

zozoooy4 


71 


1724034504 


A 
U 


0fiQ700/l 

ZDo i zy4y 


u 


0800AQQ8 
ZoZZDooO 


72 


2244278358 


A 
U 


9£Q71 £8/1 
ZDo / !Do4 


u 


081 OO8/I Q 

zoiyyo4o 


73 


2921521164 


A 
U 


0£Q700/I O 
ZDo / ZU4U 


u 


081 7/1 1 nn 

Zol i 41UU 


74 


3803130042 


A 
U 


ofiQ7onnn 

ZDo / ZUUU 


n 
u 


081 /1QH71 
zoi4yu * 1 


75 


4950798954 


A 
U 


0£Q70/I 
ZDo / Z400 


u 


081 0/1 7QQ 
ZolZ4 ( 00 


76 


6444761514 


A 
U 


0£Q71 8fifi 
ZDo / loDD 


u 


081 f»1 C\A f\ 
ZolUlU4D 


77 


8389549680 


A 
U 


9AQ71 001 

ZDo 1 iyzi 


u 


08H77001 

zou 1 ( yyi 


78 


10921197582 


A 
U 


0£Q71 870 

zdo / lo i y 


u 


ZoU0004Z 


79 


14216853012 


A 
U 


0fiQ700/lfi 
ZDo / ZZ4D 


n 
u 


080*^^^81 
Z0U00O0I 


80 


18506985300 


A 
U 


0fiQ7001 r; 
ZDo / ZUlo 


u 


08O1 0Q78 
Z0UIZ0 I 


81 


24091726728 


A 
U 


0fiQ7000^ 
ZDo / ZUZo 


u 


07001 £1/1 

z ( yyioi4 


82 


31361678988 





26371824 





27971366 


83 


40825520274 





26372065 





27951622 


84 


53145145482 





26371938 





27932357 


85 


69182396616 





26371968 





27913558 


86 


90058945560 





26371796 





27895203 


87 


117235364616 





26371917 





27877282 


88 


152612592438 





26371906 





27859778 


89 


198665414208 





26371944 





27842676 


90 


258615015792 





26371846 





27825962 
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Table 2: Number and relative frequence of square-free words in three letters that allow 0, 1, and 2 extensions, respectively. 



n 




<jJ n 




(0) / - 




(2) / - 


1 








3 


0.00000000 


n 

u 


nnnnnnno 

uuuuuuuu 


1 
1 


nnnnnnnn 

uuuuuuuu 


2 








6 


0.00000000 


n 

u 


nnnnnnno 

uuuuuuuu 


1 
1 


nnnnnnnn 

uuuuuuuu 


3 





6 


6 


0.00000000 


n 

u 


innnnnnn 

ouuuuuuu 


n 

u 


nnnnnnnn 

ouuuuuuu 


4 





6 


12 


0.00000000 


U 


QOOOOOOO 

oooooooo 


n 
u 


RRRRRRR7 
uuuuuuu 1 


5 





18 


12 


0.00000000 


n 
u 


finnnnnnn 
uuuuuuuu 


n 
u 


/innnnnnn 

4UUUUUUU 


6 





24 


18 


0.00000000 


n 

u 


o t i4zoo i 


n 

u 


498^,71 43. 

4ZOO 1 ±40 


7 


6 


30 


24 


0.10000000 


n 
u 


innnnnnn 
ouuuuuuu 


n 
u 


/innnnnnn 

4UUUUUUU 


8 





48 


30 


0.00000000 


n 

u 


u±ooo4uz 


n 

u 


00401000 


9 





72 


36 


0.00000000 


n 
u 


OOOOOOO 1 


n 
u 


Qoooooqo 
OOOOOOOO 


10 





84 


60 


0.00000000 


n 
u 


OOOOOOOO 


n 
u 


Zt1 RRRRR7 
41UUUOO 1 


11 


6 


132 


66 


0.02941176 


n 
u 


RA 7n^fi89 


n 
u 


ozoozy4i 


12 


6 


174 


84 


0.02272727 


n 
u 


ooyuyuy i 


u 


Q1 81 81 £9 
olololoz 


13 


6 


216 


120 


0.01754386 


n 
u 


uoio t oyo 


n 

U 


3^8771 q 


14 


6 


282 


168 


0.01315789 


n 
u 


ci 8491 n^ 


n 
u 


OOo4Z±UO 


15 


24 


390 


204 


0.03883495 


n 
u 


fiQl nfi7Qfi 

ooiuo t yo 


n 
u 


QQnnQ7nQ 
oouuy i uy 


16 


24 


504 


270 


0.03007519 


n 
u 


oo±o i oyo 


n 
u 


OOo040oO 


17 


24 


648 


372 


0.02298851 


n 
u 


ozuooyoo 


n 
u 


0000Z1O4 


18 


36 


882 


474 


0.02586207 


n 

u 


oooozuuy 


n 

u 


^ztn^il 794 

04U01 1 Z4 


19 


54 


1164 


612 


0.02950820 


n 
u 


fi.Qfi.nfi. ££7 

OOOUOOO ( 


n 
u 


0044ZOZO 


20 


54 


1488 


846 


0.02261307 


n 

u 




n 

u 


3^4971 3fi 
O04Z 1 lOO 


21 


120 


1974 


1086 


0.03773585 


n 
u 


con7C,/L79 
OZU ( 04 t Z 


n 
u 


QA 1 K(\QAO 
0410Uy4O 


22 


138 


2598 


1410 


0.03328509 


n 
u 


OZOOZoUo 


n 
u 


04UUoOoO 


23 


216 


3372 


1830 


0.03986711 


n 
u 


COOQCQSS 

ozzooyoo 


n 
u 


QQ77fiQni 
OO i i OOU1 


24 


240 


4386 


2406 


0.03412969 


n 
u 


R9Q79ni A 
OZO 1 ZU±4 


n 
u 


O4Z10U1 1 


25 


384 


5736 


3078 


0.04174821 


n 
u 


fiOQfil Q8Q 
OZOOlOoO 


n 
u 


00400 1 yo 


26 


444 


7410 


4038 


0.03733602 


n 
u 


fiOQl n7Q7 
ozoiu * y * 


n 
u 


ooy ooouu 


27 


528 


9696 


5262 


0.03409531 


n 
u 


ozuiioyi 


n 
u 


QQQ7Qn78 

ooy 1 yu 1 


28 


690 


12636 


6894 


0.03412463 


n 
u 


fi9zLQ9 c .89 
oziy zooz 


n 
u 


04uy4y 00 


29 


966 


16494 


8964 


0.03655767 


n 
u 


OZ4ZUOZ ( 


n 
u 


QQQ9'37nfi 

OOu^O i uo 


30 


1236 


21510 


11676 


0.03590727 


n 
u 


fi948Q1 nfi 
oz4oy ±uo 


n 
u 


ooyzuiu 1 


31 


1602 


28074 


15186 


0.03570951 


n 
u 


fi9C,78 c .74 
OZO ( oO 1 4 


n 
u 


OOOOU4 ( O 


32 


2112 


36546 


19788 


0.03613592 


n 

u 




n 

u 


OOOOUoy4 


33 


2712 


47544 


25866 


0.03562702 


u 


ROA R7fiQ/l 
UZ40 1 004 


u 


ooy ( yuu4 


34 


3522 


61992 


33762 


0.03547685 


n 
u 


fi9 444nQ c . 
UZ444Uy 


n 
u 


Q/inns99n 

04UUOZZU 


35 


4818 


80850 


43848 


0.03720004 


n 
u 


fi949479n 
UZ4Z4 1 ZU 


n 
u 


OOoOOZ ( ( 


36 


6150 


105276 


57120 


0.03648856 


n 
u 


CO/IR1 987 
OZ401ZO ( 


n 
u 


OOQQQQCQ 


37 


8094 


137094 


74328 


0.03687203 





62452851 





33859946 


38 


10452 


178392 


96906 


0.03657743 





62429396 





33912861 


39 


13854 


232254 


126096 


0.03722152 





62399652 





33878196 


40 


17784 


302658 


164004 


0.03670997 





62475075 





33853928 


41 


23082 


394014 


213570 


0.03659940 





62475859 





33864201 


42 


29970 


512856 


278328 


0.03649742 





62455520 





33894738 


43 


39438 


667878 


362196 


0.03687476 





62446985 





33865539 


44 


51030 


869604 


471636 


0.03665237 





62459437 





33875326 


45 


66792 


1132458 


613626 


0.03684312 





62467483 





33848206 
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Table 2: (continued) 



n 


,(°) 


(i) 


,(2) 


(0) / - 
UJ y „ '/w n 


wi 1 V w n 


,(2) , _ 


46 


86502 


1473930 


799278 





03665789 





62462336 


0.33871874 


47 


113064 


1918842 


1040580 





03679887 





62452425 


0.33867689 


48 


147036 


2498226 


1354740 





03675898 





62455619 


0.33868483 


49 


191952 


3252582 


1763172 





03685922 





62457097 


0.33856980 


50 


249390 


4234116 


2295420 





03678901 





62459983 


0.33861116 


51 


324852 


5511816 


2988288 





03681061 





62457150 


0.33861789 


52 


422712 


7174776 


3890904 





03679471 





62452395 


0.33868134 


53 


550758 


9341268 


5064558 





03682378 





62455892 


0.33861729 


54 


716454 


12161310 


6592620 





03679712 





62460555 


0.33859733 


55 


932592 


15831474 


8582484 





03679365 





62460074 


0.33860561 


56 


1213602 


20608380 


11174460 





03677978 





62456370 


0.33865651 


57 


1582026 


26828652 


14546622 





03682787 





62454232 


0.33862980 


58 


2058528 


34927794 


18935574 





03681077 





62458172 


0.33860751 


59 


2681796 


45468156 


24648990 





03683839 





62457166 


0.33858995 


60 


3488478 


59186910 


32090748 





03681144 





62455760 


0.33863096 


61 


4545588 


77049516 


41773302 





03684564 





62454820 


0.33860616 


62 


5914926 


100302582 


54378612 





03683106 





62456417 


0.33860477 


63 


7701792 


130572648 


70785366 





03684014 





62457079 


0.33858907 


64 


10021482 


169972482 


92149416 





03682427 





62456960 


0.33860613 


65 


13049082 


221263428 


119958804 





03683358 





62455926 


0.33860716 


66 


16985274 


288035118 


156160644 





03682995 





62455976 


0.33861029 


67 


22114344 


374963130 


203278932 





03683536 





62456755 


0.33859709 


68 


28782414 


488114994 


264623586 





03682872 





62457055 


0.33860074 


69 


37472418 


635408406 


344481342 





03683292 





62456461 


0.33860247 


70 


48778746 


827150184 


448442160 





03683163 





62456074 


0.33860763 


71 


63510756 


1076769138 


583754610 





03683845 





62456357 


0.33859799 


72 


82666266 


1401703020 


759909072 





03683423 





62456736 


0.33859840 


73 


107616300 


1824679686 


989225178 





03683571 





62456494 


0.33859935 


74 


140084994 


2375291142 


1287753906 





03683413 





62456217 


0.33860370 


75 


182377848 


3092080698 


1676340408 





03683806 





62456196 


0.33859998 


76 


237398214 


4025176920 


2182186380 





03683584 





62456569 


0.33859847 


77 


309038124 


5239825530 


2840686026 





03683608 





62456577 


0.33859815 


78 


402276216 


6820989720 


3697931646 





03683444 





62456426 


0.33860130 


79 


523700664 


8879319396 


4813832952 





03683661 





62456293 


0.33860046 


80 


681718896 


11558806080 


6266460324 





03683576 





62456450 


0.33859974 


81 


887460042 


15046854384 


8157412302 





03683671 





62456521 


0.33859808 


82 


1155219294 


19587399114 


10619060580 





03683538 





62456475 


0.33859987 


83 


1503883698 


25498127670 


13823508906 





03683685 





62456345 


0.33859970 


84 


1957680408 


33192533532 


17994931542 





03683649 





62456379 


0.33859972 


85 


2548490760 


43208866152 


23425039704 





03683727 





62456446 


0.33859827 


86 


3317442636 


56247641232 


30493861692 





03683635 





62456473 


0.33859892 


87 


4318568760 


73220999274 


39695796582 





03683674 





62456409 


0.33859917 


88 


5621734092 


95316302484 


51674555862 





03683663 





62456381 


0.33859955 


89 


7318253526 


124079305572 


67267855110 





03683708 





62456420 


0.33859872 
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4 Basics of an algebraic approach 



It is now time to attack the square-free words with some algebra. To do so, we prefer to 
change the notation 

P n (x) := \A-(x)\ = uj-(x) (4.1) 

because we will not talk about square-containing words any more. Clearly, P n (x) is always 
an integer, and we also know that, for n G IN, P n (x) < x n , with equality only for n = and 
n = 1. It is straight-forward to calculate the first cases explicitly 

P (x) = 1, Pi(x) = x, P 2 {x) = x(x-l), 
P 3 (x) = x(x-l) 2 , P 4 (x) = x 2 (x-l)(x-2). (4.2) 

The P n (x) are polynomials in x, which can be shown by induction. If we go to P n+ i(x) we 
can recursively build it from (x — l)P n (x) corrected by lower order terms that are sums of 
products of Pfc(x) with k < n - hence we stay with a polynomial. 

Let us consider these polynomials in more detail. If x > 2 and n > 2, we can fix the first 
two letters when we want to count the square-free letters of length n as ab say. For this we 
have x(x — 1) possibilities, each of which must have equally many square-free continuations. 
Similarly, let x > 3 and n > 4. The start can be of the form abc (where one has x(x— l)(x— 2) 
possibilities, each with equally many square-free continuations) or of the form abac (again 
x(x—l)(x—2) possibilities). The set of words obtained from these two classes of possibilities 
is disjoint, so we can conclude 

Proposition 3 Let P n (x) be the number of square-free words of length n in an alphabet of 
x letters. Then, forn > 1, P n (x) is a multiple ofx(x-l), while for n > 3 ; it is a multiple of 
x(x—l)(x—2). Furthermore, P n (x) is a polynomial in x of order n, with integer coefficients 
and leading coefficient 1. 

There is an alternative way to see that the last statement of the lemma is correct. Clearly, 
the number of all words of length n in x letters is just x n , and from this we have to subtract 
the number of words which contain at least one square. Necessarily, demanding that a word 
contains a square of a certain length (and no square of shorter length) means that one fixes a 
number of letters to coincide (and certain others to be different), thus all the corresponding 
terms are of lower order in x. Also, they all have integer coefficients since these are nothing 
but combinatorial factors. Furthermore, as P n {x) < x n , the coefficient of the leading term 
of P n {x) — x n has to be negative. 

So far, we did not manage to find the generating function for the polynomials P n ix) - 
and there are the usual indications that this might be a very difficult task: quite probably 
it will be one of those functions that are not analytically continuable beyond its circle 
of convergence. We assume this due to various unsuccessful attempts to find generating 
functions for the numbers oj~{x) by means of standard algebraic program packages - although 
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it is certainly not conclusive. So, we have tried to find a reasonable approximation scheme. 
Here, we observe, from the explicit counting and the determination of the P n (x) up to n = 15, 
that the recurrence of the polynomials, for n > 2, look as follows 

P n+1 (x) = (x - 1) • P n (x) - P n ^(x) + R n „ 2 (x) . (4.3) 

Here, the remainder R n _ 2 (x) is a polynomial in x of at most degree n — 2, but usually of 
considerably smaller degree (though we do not know how to quantify this at the moment). 
The first two terms on the right hand side of ( (4.3| ) are clear: one has x — 1 possibilities to 
extend a square-free word of length n into one of length n+1 without a square of length 2 at 
the end. From these (if n > 2) essentially P n _i(x) words have to be subtracted because they 
contain a square of length 4 at the end. Further restrictions reach deaper into the word, as 
can be seen from the structure of the stop-words. 

Consequently, at least for large x, one would expect a reasonable approximation by simply 
neglecting the lower order terms. So, let us consider polynomials Q n (x) defined through 

Qn+l(x) = (X- 1) -Q n {x) -Q„-l(x) (4.4) 

with initial conditions Qo(x) = 1 and Q_i(x) = 0. In this case, the generating function can 
easily be calculated 

OO -| 

F(x,t) := £Q n (z)-r = — — (4.5) 

t 2 -(x- l)t + 1 



by applying the recurrence relation ( f4.4| ) and observing the initial conditions properly. 

For a given x, the growth rate of the coefficients (which we know to converge) is given 
by the inverse of the radius of convergence g of the generating function wherefore we obtain 
the simple formula 



s(x) = log . (4.6) 

Although the first two terms in the asymptotic expansion of ( |4.6|) match those of the upper 
bound, log(x — 1) ~ log(a:) — x" 1 , for finite x it gives a much better approximation to the 
true value of the entropy (see Table 3) except for x = 3, where s(3) = 0. The lower bounds 
are those due to Proposition 1 with e = log(2)/21 while the estimate is obtained from 
counting square-free words in x letters up to length n max (by extrapolating the logarithm of 
the successive ratios). The error is roughly 1 figure in the last digit. The upper bound is 
again strict and was calculated from Eq. (|2.20| ) as 

' 'max ^ \ A 1, J ' x ) / 

The convergence of s(x) is rather quick (see Table 3), and the above arguments indicate 
that s(x) is asymptotically exact - a property that deserves further investigation. 
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Tabic 3: Bounds and estimates for the entropy of square-free words in x letters. 



X 




lower bound 


estimate 


upper bound 


log(a; — 1) 


s(x) 


3 


90 


0.03300701 


0.263719 


0.27825962 


0.69314718 


0.00000000 


4 


26 


0.26405607 


0.96375 


0.97319304 


1.09861229 


0.96242365 


5 


21 


0.43734286 


1.317089 


1.32225469 


1.38629436 


1.31695790 


6 


18 


0.57597230 


1.56682 


1.57028618 


1.60943791 


1.56679924 


7 


16 


0.69149683 


1.76275 


1.76530829 


1.79175947 


1.76274717 


8 


16 


0.79051786 


1.924850 


1.92663981 


1.94591015 


1.92484730 


9 


15 


0.87716125 


2.063438 


2.06486642 


2.07944154 


2.06343707 


10 


14 


0.95417761 


2.184644 


2.18583786 


2.19722458 


2.18464379 


11 


12 


1.02349232 


2.29243 


2.29357100 


2.30258509 


2.29243167 


12 


12 


1.08650570 


2.38953 


2.39045454 


2.39789527 


2.38952643 



5 Concluding remarks 

In this article, we have discussed various aspects of the ensemble of square-free words in a 
finite alphabet with x letters, with some emphasis on the entropy density in the thermody- 
namic limit. Though we could give various rigorous bounds for the entropy (which is, in 
particular, strictly positive for x > 2), we were neither able to solve the problem analytically 
nor able to construct an exhaustive lower bound (while this is easy for the upper bound). 
Nevertheless, an approximate generating function was given which results in an entropy 
estimate that is asymptotically exact and astonishingly accurate already for small x > 3. 

Several questions remain open. Although standard criteria point to non-solvability of 
the problem (in the sense that the generating function has the circle of convergence as its 
analyticity domain), this needs further investigation. Here, a better understanding of the 
lower bound would help because it would shed more light onto possbile methods to essentially 
exhaust the square-free words, at least w.r.t. their exponential growth. We hope to report 
on further findings in this direction in the near future. 
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